Chromatic Clustering in High Dimensional Space
نویسندگان
چکیده
In this paper, we study a new type of clustering problem, called Chromatic Clustering, in high dimensional space. Chromatic clustering seeks to partition a set of colored points into groups (or clusters) so that no group contains points with the same color and a certain objective function is optimized. In this paper, we consider two variants of the problem, chromatic k-means clustering (denoted as kCMeans) and chromatic k-medians clustering (denoted as k-CMedians), and investigate their hardness and approximation solutions. For k-CMeans, we show that the additional coloring constraint destroys several key properties (such as the locality property) used in existing k-means techniques (for ordinary points), and significantly complicates the problem. There is no FPTAS for the chromatic clustering problem, even if k = 2. To overcome the additional difficulty, we develop a standalone result, called Simplex Lemma, which enables us to efficiently approximate the mean point of an unknown point set through a fixed dimensional simplex. A nice feature of the simplex is its independence with the dimensionality of the original space, and thus can be used for problems in very high dimensional space. With the simplex lemma, together with several random sampling techniques, we show that a (1 + )-approximation of k-CMeans can be achieved in near linear time through a sphere peeling algorithm. For k-CMedians, we show that a similar sphere peeling algorithm exists for achieving constant approximation solutions. ar X iv :1 20 4. 66 99 v2 [ cs .C G ] 1 2 Se p 20 12
منابع مشابه
یک روش مبتنی بر خوشهبندی سلسلهمراتبی تقسیمکننده جهت شاخصگذاری اطلاعات تصویری
It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملSolving the Chromatic Cone Clustering Problem via Minimum Spanning Sphere
In this paper, we study the following Chromatic Cone Clustering (CCC) problem: Given n point-sets with each containing k points in the first quadrant of the d-dimensional space R, find k cones apexed at the origin such that each cone contains at least one distinct point (i.e., different from other cones) from every point-set and the total size of the k cones is minimized, where the size of a co...
متن کاملSupervised Feature Extraction of Face Images for Improvement of Recognition Accuracy
Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...
متن کاملExtraction of Scene Text in HSI Color Space using K-means Clustering with Chromatic and Intensity Distance
Text extraction is important step that strongly influences on the final recognition performance. This task is especially challenging in case of scene text which is characterized with wide set of degradations like complex backgrounds, uneven illumination, viewing angle, etc. In this paper we evaluated text extraction based on K-means clustering in HSI color space with chromatic distance and inte...
متن کامل